Goto

Collaborating Authors

 Go


'There's this deep mystery of what, actually, is this thing?': the philosopher inside Google DeepMind

The Guardian

'There's this deep mystery of what, actually, is this thing?': the philosopher inside Google DeepMind AI Since 2017, Iason Gabriel has worked at the tech giant, trying to anticipate - and think through - the impact of AI. But as commercial and geopolitical pressures escalate, can ethicists make any difference? In 2017, a 33-year-old political philosopher named Iason Gabriel was told by a friend that he ought to apply for a job at DeepMind, the London-based subsidiary of Google where much of its AI research was concentrated. The suggestion was not an obvious one. Gabriel was a cheerful but intense junior academic with a passion for Vipassana meditation and what his brother calls "enthusiastic" rock climbing. At the University of Oxford, where he was a fellow at St John's College, Gabriel taught courses on political theory and wrote papers on the moral contortions of "yuppie ethics" and the ethical blind spots of effective altruism. When he wasn't there, he did crisis work for the United Nations Development Programme in Sudan and Lebanon. DeepMind, meanwhile, was the world's leading AI research lab. In part, this was because it had the financial and computational backing of Google, which had bought the company in 2014 for $650m. In part, it was because DeepMind had recently shown it could put those resources to stunning use. In Seoul, in 2016, a DeepMind system called AlphaGo defeated Lee Sedol, a South Korean Go champion, in a five-game match. The victory was significant not least because of Go's legendary complexity; the game has more possible configurations than there are atoms in the universe. Thanks to the fuss around AlphaGo, Gabriel was aware of DeepMind.


Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

Neural Information Processing Systems

Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding, matching or surpassing human capabilities. However, these impressive reasoning abilities face significant challenges in specialized domains. Taking Go as an example, although AlphaGo has established the high performance ceiling of AI systems in Go, mainstream LLMs still struggle to reach even beginner-level proficiency, let alone perform natural language reasoning. This performance gap between general-purpose LLMs and domain experts is significantly limiting the application of LLMs on a wider range of domain-specific tasks. In this work, we aim to bridge the divide between LLMs' general reasoning capabilities and expert knowledge in domain-specific tasks. We perform mixed fine-tuning with structured Go expertise and general long Chain-ofThought (CoT) reasoning data as a cold start, followed by reinforcement learning to integrate expert knowledge in Go with general reasoning capabilities. Through this methodology, we present LoGos, a powerful LLM that not only maintains outstanding general reasoning abilities, but also conducts Go gameplay in natural language, demonstrating effective strategic reasoning and accurate next-move prediction. LoGos achieves performance comparable to human professional players, substantially surpassing all existing LLMs. Through this work, we aim to contribute insights on applying general LLM reasoning capabilities to specialized domains.


Can Large Language Models Master Complex Card Games?

Neural Information Processing Systems

Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https://github.com/THUDM/


Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

Neural Information Processing Systems

Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding, matching or surpassing human capabilities. However, these impressive reasoning abilities face significant challenges in specialized domains. Taking Go as an example, although AlphaGo has established the high performance ceiling of AI systems in Go, mainstream LLMs still struggle to reach even beginner-level proficiency, let alone perform natural language reasoning. This performance gap between general-purpose LLMs and domain experts is significantly limiting the application of LLMs on a wider range of domain-specific tasks. In this work, we aim to bridge the divide between LLMs' general reasoning capabilities and expert knowledge in domain-specific tasks.


A Game Plan for the AI Boom

The Atlantic - Technology

Ten years ago, AlphaGo trounced human competitors--and its legacy is still present in today's most advanced bots. Thore Graepel may have been the first human to be vanquished by a superintelligence. In 2015, on his first day as a researcher at Google DeepMind, he was challenged to play against the earliest iteration of AlphaGo--a computer program developed by DeepMind that would prove so effective at the ancient-Chinese game of (or Go, as it is commonly known in the West) that it changed how humans play it, and then upended the field of AI itself. When Graepel faced it, AlphaGo was just a "baby" project, as he put it to me, and he was an accomplished amateur player. But it still took him down.


How an intern helped build the AI that shook the world

New Scientist

Chris Maddison was just an intern when he started working on the Go-playing AI that would eventually become AlphaGo. In March 2016, Google DeepMind's artificial intelligence system AlphaGo shocked the world. In a stunning five-match series of Go, the ancient Chinese board game, the AI beat the world's best player, Lee Sedol - a moment that was televised in front of millions and hailed by many as a historic moment in the development of artificial intelligence. Chris Maddison, now a professor of artificial intelligence at the University of Toronto, was then a master's student and helped get the project off the ground. Alex Wilkins: How did the idea for AlphaGo first come about?


The moment that kicked off the AI revolution

New Scientist

Has the technology lived up to its potential? The first time that AlphaGo revealed its full power, it prompted a visceral reaction . Lee Sedol, the world's greatest player of the ancient Chinese board game Go, had grown visibly agitated at the artificial intelligence's prowess. The hushed crowd in downtown Seoul, South Korea, could barely contain its gasps. It was quickly dawning on Lee, and the tens of millions watching at home, that this AI was different to those that had come before. It wasn't just beating Lee, but it was doing so with an almost human-like aptitude.


ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang

Neural Information Processing Systems

Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning).


AlphaFold Changed Science. After 5 Years, It's Still Evolving

WIRED

WIRED spoke with DeepMind's Pushmeet Kohli about the recent past--and promising future--of the Nobel Prize-winning research project that changed biology and chemistry forever. Amino acids "folded" to form a protein. Over the past few years, we've periodically reported on its successes; last year, it won the Nobel Prize in Chemistry . Until AlphaFold's debut in November 2020, DeepMind had been best known for teaching an artificial intelligence to beat human champions at the ancient game of Go Its work culminated in the compilation of a database that now contains over 200 million predicted structures, essentially the entire known protein universe, and is used by nearly 3.5 million researchers in 190 countries around the world The Nature article published in 2021 describing the algorithm has been cited 40,000 times to date. Last year, AlphaFold 3 arrived, extending the capabilities of artificial intelligence to DNA, RNA, and drugs.


Are AlphaZero-like Agents Robust to Adversarial Perturbations?

Neural Information Processing Systems

The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are ``semantically'' equivalent to the original state by adding meaningless moves to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for Go beginners. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that the actions taken by both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58\% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players, and 90\% of examples indeed lead the Go agent to play an obviously inferior action.